Somatic Mutations

The goal of this notebook is to introduce you to the Somatic Mutations BigQuery table.

This table is based on the open-access somatic mutation calls available in MAF files at the DCC. In addition to uploading all current MAF files from the DCC, the mutations were also annotated using Oncotator. A subset of the columns in the underlying MAF files and a subset of the Oncotator outputs were then assembled in this table.

In addition, the ETL process includes several data-cleaning steps because many tumor types actually have multiple current MAF files and therefore potentially duplicate mutation calls. In some cases, a tumor sample may have had mutations called relative to both a blood-normal and an adjacent-tissue sample, and in other cases MAF files may contain mutations called on more than one aliquot from the same sample. Every effort was made to include all of the available data at the DCC while avoiding having multiple rows in the mutation table describing the same somatic mutation. Note, however, that if the same mutation was called by multiple centers and appeared in different MAF files, it may be described on muliple rows (as you will see later in this notebook). Furthermore, in some cases, the underlying MAF file may have been based on a multi-center mutationa-calling exercise, in which case you may see a list of centers in the Center field, eg "bcgsc.ca;broad.mit.edu;hgsc.bcm.edu;mdanderson.org;ucsc.edu".

In conclusion, if you are counting up the number of mutations observed in a sample or a patient or a tumor-type, be sure to include the necessary GROUP BY clause(s) in order to avoid double-counting!

As usual, in order to work with BigQuery, you need to import the bigquery module (gcp.bigquery) and you need to know the name(s) of the table(s) you are going to be working with:



In [1]:

    
import gcp.bigquery as bq
somatic_mutations_BQtable = bq.Table('isb-cgc:tcga_201607_beta.Somatic_Mutation_calls')

Let's start by taking a look at the table schema:



In [2]:

    
%bigquery schema --table $somatic_mutations_BQtable









    Out[2]:

That's a lot of fields! Let's dig in a bit further to see what is included in this table. For example let's count up the number of unique patients, tumor-samples, and normal-samples based on barcode identifiers:



In [3]:

    
%%sql --module count_unique

SELECT COUNT(DISTINCT $f,25000) AS n
FROM $t



In [4]:

    
fieldList = ['ParticipantBarcode', 'Tumor_SampleBarcode', 'Normal_SampleBarcode' ]
for aField in fieldList:
  field = somatic_mutations_BQtable.schema[aField]
  rdf = bq.Query(count_unique,t=somatic_mutations_BQtable,f=field).to_dataframe()
  print " There are %6d unique values in the field %s. " % ( rdf.iloc[0]['n'], aField)









    



 There are  10581 unique values in the field ParticipantBarcode. 
 There are  10698 unique values in the field Tumor_SampleBarcode. 
 There are  10581 unique values in the field Normal_SampleBarcode.

Now let's look at a few key fields and find the top-5 most frequent values in each field:



In [5]:

    
%%sql --module top_5_values

SELECT $f, COUNT(*) AS n
FROM $t
WHERE ( $f IS NOT NULL )
GROUP BY $f
ORDER BY n DESC
LIMIT 5

You can use the parameterized query defined above to find the top-5 most frequently occurring values for any field of interest, for example:



In [6]:

    
bq.Query(top_5_values,t=somatic_mutations_BQtable,f=somatic_mutations_BQtable.schema['Hugo_Symbol']).results().to_dataframe()









    Out[6]:






  
    
      
      Hugo_Symbol
      n
    
  
  
    
      0
      Unknown
      63962
    
    
      1
      TTN
      20183
    
    
      2
      MUC16
      11742
    
    
      3
      FLG
      11039
    
    
      4
      OBSCN
      6967



In [7]:

    
bq.Query(top_5_values,t=somatic_mutations_BQtable,f=somatic_mutations_BQtable.schema['Center']).results().to_dataframe()









    Out[7]:






  
    
      
      Center
      n
    
  
  
    
      0
      broad.mit.edu
      3798074
    
    
      1
      broad.mit.edu;hgsc.bcm.edu
      346100
    
    
      2
      broad.mit.edu;genome.wustl.edu
      253889
    
    
      3
      bcgsc.ca
      219239
    
    
      4
      bcgsc.ca;broad.mit.edu
      218444



In [8]:

    
bq.Query(top_5_values,t=somatic_mutations_BQtable,f=somatic_mutations_BQtable.schema['Mutation_Status']).results().to_dataframe()









    Out[8]:






  
    
      
      Mutation_Status
      n
    
  
  
    
      0
      Somatic
      5355991



In [9]:

    
bq.Query(top_5_values,t=somatic_mutations_BQtable,f=somatic_mutations_BQtable.schema['Protein_Change']).results().to_dataframe()









    Out[9]:






  
    
      
      Protein_Change
      n
    
  
  
    
      0
      p.M1I
      1679
    
    
      1
      p.V600E
      588
    
    
      2
      p.R132H
      533
    
    
      3
      p.M1V
      463
    
    
      4
      p.Q269Q
      398

Everyone probably recognizes the V600E mutation in the previous result, so let's use that well-known BRAF mutation as a way to explore what other information is available in this table.



In [10]:

    
%%sql --module find_BRAF_V600E

SELECT
  Tumor_SampleBarcode,
  Study,
  Hugo_Symbol,
  Genome_Change,
  Protein_Change
FROM
  $t
WHERE
  ( Hugo_Symbol="BRAF"
    AND Protein_Change="p.V600E" )
GROUP BY
  Tumor_SampleBarcode,
  Study,
  Hugo_Symbol,
  Genome_Change,
  Protein_Change
ORDER BY
  Study,
  Tumor_SampleBarcode



In [11]:

    
r = bq.Query(find_BRAF_V600E,t=somatic_mutations_BQtable).results()
r.to_dataframe()









    Out[11]:






  
    
      
      Tumor_SampleBarcode
      Study
      Hugo_Symbol
      Genome_Change
      Protein_Change
    
  
  
    
      0
      TCGA-YF-AA3L-01A
      BLCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      1
      TCGA-ZU-A8S4-01A
      CHOL
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      2
      TCGA-A6-2686-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      3
      TCGA-A6-3809-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      4
      TCGA-A6-5661-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      5
      TCGA-A6-5665-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      6
      TCGA-A6-6649-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      7
      TCGA-A6-6653-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      8
      TCGA-AA-3492-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      9
      TCGA-AA-3664-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      10
      TCGA-AA-3684-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      11
      TCGA-AA-3713-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      12
      TCGA-AA-3715-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      13
      TCGA-AA-3815-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      14
      TCGA-AA-3821-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      15
      TCGA-AA-3833-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      16
      TCGA-AA-3877-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      17
      TCGA-AA-3947-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      18
      TCGA-AA-3949-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      19
      TCGA-AA-3966-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      20
      TCGA-AA-A01D-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      21
      TCGA-AA-A01P-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      22
      TCGA-AA-A022-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      23
      TCGA-AA-A02R-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      24
      TCGA-AD-5900-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      25
      TCGA-AD-6889-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      26
      TCGA-AD-A5EJ-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      27
      TCGA-AM-5820-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      28
      TCGA-AM-5821-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      29
      TCGA-AU-6004-01A
      COAD
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      557
      TCGA-IM-A41Y-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      558
      TCGA-IM-A420-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      559
      TCGA-IM-A4EB-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      560
      TCGA-J8-A3NZ-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      561
      TCGA-J8-A3O2-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      562
      TCGA-J8-A3O2-06A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      563
      TCGA-J8-A3YD-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      564
      TCGA-J8-A3YE-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      565
      TCGA-J8-A3YF-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      566
      TCGA-J8-A3YG-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      567
      TCGA-J8-A3YH-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      568
      TCGA-J8-A3YH-06A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      569
      TCGA-J8-A42S-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      570
      TCGA-J8-A4HY-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      571
      TCGA-KS-A41F-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      572
      TCGA-KS-A41J-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      573
      TCGA-KS-A4I1-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      574
      TCGA-KS-A4I3-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      575
      TCGA-KS-A4I5-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      576
      TCGA-KS-A4I7-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      577
      TCGA-KS-A4I9-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      578
      TCGA-KS-A4IB-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      579
      TCGA-KS-A4IC-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      580
      TCGA-L6-A4EP-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      581
      TCGA-L6-A4EQ-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      582
      TCGA-L6-A4ET-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      583
      TCGA-L6-A4EU-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      584
      TCGA-MK-A4N6-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      585
      TCGA-MK-A4N7-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
    
      586
      TCGA-MK-A4N9-01A
      THCA
      BRAF
      g.chr7:140453136A>T
      p.V600E
    
  

587 rows × 5 columns

Let's count these mutations up by study (tumor-type):



In [12]:

    
%%sql

SELECT Study, COUNT(*) AS n
FROM $r
GROUP BY Study
HAVING n > 1
ORDER BY n DESC









    Out[12]:





    Study n
THCA 300
SKCM 212
COAD 50
LUAD 10
GBM 6
KIRP 2
LGG 2
    
(rows: 7, time: 0.8s,     3KB processed, job: job_EhSIiJqwnPY5ciQgmSyhTsTvNWU)

You may have noticed that in our earlier query, we did a GROUP BY to make sure that we didn't count the same mutation called on the same sample more than once. We might want to GROUP BY patient instead to see if that changes our counts -- we may have multiple samples from some patients.



In [13]:

    
%%sql --module find_BRAF_V600E_by_patient

SELECT
  ParticipantBarcode,
  Study,
  Hugo_Symbol,
  Genome_Change,
  Protein_Change
FROM
  $t
WHERE
  ( Hugo_Symbol="BRAF"
    AND Protein_Change="p.V600E" )
GROUP BY
  ParticipantBarcode,
  Study,
  Hugo_Symbol,
  Genome_Change,
  Protein_Change
ORDER BY
  Study,
  ParticipantBarcode



In [14]:

    
r = bq.Query(find_BRAF_V600E_by_patient,t=somatic_mutations_BQtable).results()



In [15]:

    
%%sql

SELECT Study, COUNT(*) AS n
FROM $r
GROUP BY Study
HAVING n > 1
ORDER BY n DESC









    Out[15]:





    Study n
THCA 294
SKCM 211
COAD 50
LUAD 9
GBM 6
KIRP 2
LGG 2
    
(rows: 7, time: 0.8s,     3KB processed, job: job_wEBpxuzXwUPQ4A919iDBpYN7Zo4)

When we counted the number of mutated samples, we found 261 THCA samples, but when we counted the number of patients, we found 258 THCA patients, so let's see what's going on there.



In [16]:

    
%%sql

SELECT
  ParticipantBarcode,
  COUNT(*) AS m
FROM (
  SELECT
    ParticipantBarcode,
    Tumor_SampleBarcode,
    COUNT(*) AS n
  FROM
    $somatic_mutations_BQtable
  WHERE
    ( Hugo_Symbol="BRAF"
      AND Protein_Change="p.V600E"
      AND Study="THCA" )
  GROUP BY
    ParticipantBarcode,
    Tumor_SampleBarcode,
    )
GROUP BY
  ParticipantBarcode
HAVING
  m > 1
ORDER BY
  m DESC









    Out[16]:





    ParticipantBarcode m
TCGA-J8-A3YH 2
TCGA-EM-A2P1 2
TCGA-EM-A3SU 2
TCGA-J8-A3O2 2
TCGA-EM-A2CS 2
TCGA-DE-A4MD 2
    
(rows: 6, time: 1.0s,   286MB processed, job: job_77CJKLhHZId2oTUdPcV9wfrAeoA)

Sure enough, we see that the same mutation is reported twice for each of these three patients. Let's look at why:



In [17]:

    
%%sql

SELECT
  ParticipantBarcode,
  Tumor_SampleBarcode,
  Tumor_SampleTypeLetterCode,
  Normal_SampleBarcode,
  Normal_SampleTypeLetterCode,
  Center,
FROM
  $somatic_mutations_BQtable
WHERE
  ( Hugo_Symbol="BRAF"
    AND Protein_Change="p.V600E"
    AND Study="THCA"
    AND ParticipantBarcode="TCGA-EM-A2P1" )
ORDER BY
  Tumor_SampleBarcode,
  Normal_SampleBarcode,
  Center









    Out[17]:





    ParticipantBarcode Tumor_SampleBarcode Tumor_SampleTypeLetterCode Normal_SampleBarcode Normal_SampleTypeLetterCode Center
TCGA-EM-A2P1 TCGA-EM-A2P1-01A TP TCGA-EM-A2P1-10A NB bcgsc.ca;broad.mit.edu;hgsc.bcm.edu
TCGA-EM-A2P1 TCGA-EM-A2P1-06A TM TCGA-EM-A2P1-10A NB bcgsc.ca;broad.mit.edu;hgsc.bcm.edu
    
(rows: 2, time: 1.1s,   524MB processed, job: job_BNWbHokrmzPc3fY8PfCPiHjiSTM)

Aha! not only did this patient provide both a primary tumor (TP) and a metastatic (TM) sample, but we have mutation calls from three different centers.

Finally, let's pick out one of these mutations and see what some of the other fields in this table can tell us:



In [18]:

    
%%sql

SELECT
  ParticipantBarcode,
  Tumor_SampleTypeLetterCode,
  Normal_SampleTypeLetterCode,
  Study,
  Center,
  Variant_Type,
  Variant_Classification,
  Genome_Change,
  cDNA_Change,
  Protein_Change,
  UniProt_Region,
  COSMIC_Total_Alterations_In_Gene,
  DrugBank
FROM
  $somatic_mutations_BQtable
WHERE
  ( Hugo_Symbol="BRAF"
    AND Protein_Change="p.V600E"
    AND Study="THCA"
    AND ParticipantBarcode="TCGA-EM-A2P1"
    AND Tumor_SampleTypeLetterCode="TP"
    AND Center="broad.mit.edu" )









    Out[18]:





    
    
(rows: 0, time: 0.8s,   778MB processed, job: job__anjZ4ojpEEynW7YDH6LTbgaQoc)

When working with variants or mutations, there is another public BigQuery table that you might find useful. Developed by Tute Genomics, this comprehensive, publicly-available database of over 8.5 billion known variants was announced earlier this year. This table includes several types of annotations and scores, such ase Polyphen2 and MutationAssessor, and a proprietary "Tute score" which estimates whether a SNP or indel is likely to be associate with Mendelian phenotypes.

For example, you can look up all exonic BRAF mutations in the TuteTable in less than 20 seconds:



In [19]:

    
%%sql --module BRAF_TUTE

SELECT
  Chr,
  Start,
  Func,
  Gene,
  AA,
  Polyphen2_HDIV_score,
  Polyphen2_HVAR_score,
  MutationAssessor_score,
  TUTE
FROM
  [silver-wall-555:TuteTable.hg19]
WHERE
  ( Gene="BRAF"
    AND Func="exonic" )
ORDER BY
  Start ASC



In [20]:

    
tuteBRAFscores = bq.Query(BRAF_TUTE).results().to_dataframe()
tuteBRAFscores.describe()









    



/usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)






    Out[20]:






  
    
      
      Start
      Polyphen2_HDIV_score
      Polyphen2_HVAR_score
      MutationAssessor_score
      TUTE
    
  
  
    
      count
      6.903000e+03
      4985.000000
      4985.000000
      4976.000000
      6903.000000
    
    
      mean
      1.404945e+08
      0.556226
      0.413867
      1.247881
      0.332186
    
    
      std
      4.577719e+04
      0.417165
      0.395638
      1.182233
      0.306410
    
    
      min
      1.404344e+08
      0.000000
      0.000000
      -2.770000
      0.000000
    
    
      25%
      1.404540e+08
      NaN
      NaN
      NaN
      0.108000
    
    
      50%
      1.404874e+08
      NaN
      NaN
      NaN
      0.196700
    
    
      75%
      1.405087e+08
      NaN
      NaN
      NaN
      0.568100
    
    
      max
      1.406245e+08
      1.000000
      1.000000
      4.505000
      0.931700

Let's go back to the TCGA somatic mutations table and pull out all BRAF mutations and then join them with the matching mutations in the Tute Table so that we can compare the distribution of scores (eg MutationAssessor and TUTE) between the somatic mutations seen in TCGA and the larger set of variants contained in the Tute Table.



In [21]:

    
%%sql --module TCGA_BRAF

SELECT
  Hugo_Symbol,
  Protein_Change,
  MutationAssessor_score,
  TUTE
FROM (
  SELECT
    Hugo_Symbol,
    Protein_Change
  FROM
    $t
  WHERE
    ( Hugo_Symbol="BRAF" )
  GROUP BY
    Hugo_Symbol,
    Protein_Change ) AS tcga
JOIN (
  SELECT
    Gene,
    AA,
    MutationAssessor_score,
    TUTE
  FROM
    [silver-wall-555:TuteTable.hg19]
  WHERE
    ( Gene="BRAF" ) ) AS tute
ON
  tcga.Hugo_Symbol=tute.Gene
  AND tcga.Protein_Change=tute.AA



In [22]:

    
tcgaBRAFscores = bq.Query(TCGA_BRAF,t=somatic_mutations_BQtable).results().to_dataframe()
tcgaBRAFscores.describe()









    Out[22]:






  
    
      
      MutationAssessor_score
      TUTE
    
  
  
    
      count
      194.000000
      298.000000
    
    
      mean
      1.531443
      0.337101
    
    
      std
      1.376454
      0.321675
    
    
      min
      -1.285000
      0.000000
    
    
      25%
      NaN
      0.096200
    
    
      50%
      NaN
      0.188800
    
    
      75%
      NaN
      0.645875
    
    
      max
      4.475000
      0.928100



In [23]:

    
import numpy as np
import matplotlib.pyplot as plt



In [24]:

    
plt.hist(tuteBRAFscores['TUTE'],bins=50,normed=True,color='red',alpha=0.6,label='all variants');
plt.hist(tcgaBRAFscores['TUTE'],bins=50,normed=True,color='blue',alpha=0.4,label='TCGA somatic mutations');
plt.legend(loc='upper right');
plt.xlabel('TUTE score');
plt.ylabel('probability');



In [25]:

    
plt.hist(tuteBRAFscores['MutationAssessor_score'],bins=45,range=[-4,5],normed=True,color='red',alpha=0.6,label='all variants');
plt.hist(tcgaBRAFscores['MutationAssessor_score'],bins=45,range=[-4,5],normed=True,color='blue',alpha=0.4,label='TCGA somatic mutations');
plt.legend(loc='upper right');
plt.xlabel('MutationAssessor score');
plt.ylabel('probability');

Both of these plots suggest that some of the somatic BRAF mutations observed in TCGA tumor samples are scored as more deleterious by both TUTE and MutationAssessor. In the TUTE histogram, a larger fraction of somatic mutations also get a score of 0.

Note that in these histograms, each count represents a single variant, ie a specific protein change. Mutations that are seen across multiple patients are not being counted multiple times.



In [ ]:

	Center	n
0	broad.mit.edu	3798074
1	broad.mit.edu;hgsc.bcm.edu	346100
2	broad.mit.edu;genome.wustl.edu	253889
3	bcgsc.ca	219239
4	bcgsc.ca;broad.mit.edu	218444

	Tumor_SampleBarcode	Study	Hugo_Symbol	Genome_Change	Protein_Change
0	TCGA-YF-AA3L-01A	BLCA	BRAF	g.chr7:140453136A>T	p.V600E
1	TCGA-ZU-A8S4-01A	CHOL	BRAF	g.chr7:140453136A>T	p.V600E
2	TCGA-A6-2686-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
3	TCGA-A6-3809-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
4	TCGA-A6-5661-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
5	TCGA-A6-5665-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
6	TCGA-A6-6649-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
7	TCGA-A6-6653-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
8	TCGA-AA-3492-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
9	TCGA-AA-3664-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
10	TCGA-AA-3684-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
11	TCGA-AA-3713-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
12	TCGA-AA-3715-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
13	TCGA-AA-3815-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
14	TCGA-AA-3821-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
15	TCGA-AA-3833-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
16	TCGA-AA-3877-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
17	TCGA-AA-3947-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
18	TCGA-AA-3949-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
19	TCGA-AA-3966-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
20	TCGA-AA-A01D-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
21	TCGA-AA-A01P-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
22	TCGA-AA-A022-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
23	TCGA-AA-A02R-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
24	TCGA-AD-5900-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
25	TCGA-AD-6889-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
26	TCGA-AD-A5EJ-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
27	TCGA-AM-5820-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
28	TCGA-AM-5821-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
29	TCGA-AU-6004-01A	COAD	BRAF	g.chr7:140453136A>T	p.V600E
...	...	...	...	...	...
557	TCGA-IM-A41Y-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
558	TCGA-IM-A420-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
559	TCGA-IM-A4EB-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
560	TCGA-J8-A3NZ-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
561	TCGA-J8-A3O2-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
562	TCGA-J8-A3O2-06A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
563	TCGA-J8-A3YD-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
564	TCGA-J8-A3YE-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
565	TCGA-J8-A3YF-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
566	TCGA-J8-A3YG-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
567	TCGA-J8-A3YH-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
568	TCGA-J8-A3YH-06A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
569	TCGA-J8-A42S-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
570	TCGA-J8-A4HY-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
571	TCGA-KS-A41F-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
572	TCGA-KS-A41J-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
573	TCGA-KS-A4I1-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
574	TCGA-KS-A4I3-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
575	TCGA-KS-A4I5-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
576	TCGA-KS-A4I7-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
577	TCGA-KS-A4I9-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
578	TCGA-KS-A4IB-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
579	TCGA-KS-A4IC-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
580	TCGA-L6-A4EP-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
581	TCGA-L6-A4EQ-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
582	TCGA-L6-A4ET-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
583	TCGA-L6-A4EU-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
584	TCGA-MK-A4N6-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
585	TCGA-MK-A4N7-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E
586	TCGA-MK-A4N9-01A	THCA	BRAF	g.chr7:140453136A>T	p.V600E

ParticipantBarcode	m
TCGA-J8-A3YH	2
TCGA-EM-A2P1	2
TCGA-EM-A3SU	2
TCGA-J8-A3O2	2
TCGA-EM-A2CS	2
TCGA-DE-A4MD	2

ParticipantBarcode	Tumor_SampleBarcode	Tumor_SampleTypeLetterCode	Normal_SampleBarcode	Normal_SampleTypeLetterCode	Center
TCGA-EM-A2P1	TCGA-EM-A2P1-01A	TP	TCGA-EM-A2P1-10A	NB	bcgsc.ca;broad.mit.edu;hgsc.bcm.edu
TCGA-EM-A2P1	TCGA-EM-A2P1-06A	TM	TCGA-EM-A2P1-10A	NB	bcgsc.ca;broad.mit.edu;hgsc.bcm.edu

	Start	Polyphen2_HDIV_score	Polyphen2_HVAR_score	MutationAssessor_score	TUTE
count	6.903000e+03	4985.000000	4985.000000	4976.000000	6903.000000
mean	1.404945e+08	0.556226	0.413867	1.247881	0.332186
std	4.577719e+04	0.417165	0.395638	1.182233	0.306410
min	1.404344e+08	0.000000	0.000000	-2.770000	0.000000
25%	1.404540e+08	NaN	NaN	NaN	0.108000
50%	1.404874e+08	NaN	NaN	NaN	0.196700
75%	1.405087e+08	NaN	NaN	NaN	0.568100
max	1.406245e+08	1.000000	1.000000	4.505000	0.931700

	MutationAssessor_score	TUTE
count	194.000000	298.000000
mean	1.531443	0.337101
std	1.376454	0.321675
min	-1.285000	0.000000
25%	NaN	0.096200
50%	NaN	0.188800
75%	NaN	0.645875
max	4.475000	0.928100